Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
نویسندگان
چکیده
A key element in the solution of reinforcement learning problems is the value function The purpose of this function is to measure the long term utility or value of any given state The function is important because an agent can use this measure to decide what to do next A common problem in reinforcement learning when applied to systems having continuous states and action spaces is that the value function must operate with a domain consisting of real valued variables which means that it should be able to represent the value of in nitely many state and action pairs For this reason function approximators are used to represent the value function when a close form solution of the optimal policy is not available In this paper we extend a previously proposed reinforcement learning algorithm so that it can be used with function approximators that generalize the value of individual experiences across both state and action spaces In particular we discuss the bene ts of using sparse coarse coded function approximators to represent value functions and describe in detail three implementations CMAC instance based and case based Additionally we discuss how function approximators having di erent degrees of resolution in di erent regions of the state and action spaces may in uence the performance and learning e ciency of the agent We propose a simple and modular technique that can be used to implement function approximators with non uniform degrees of resolution so that it can represent the value function with higher accuracy in important regions of the state and action spaces We performed extensive experiments in the double integrator and pendulum swing up systems to demonstrate the proposed ideas Kewords Reinforcement learning function approximation memory based methods continu ous domains optimal control resource preallocation
منابع مشابه
Reinforcement Learning in Continuous State and Action Spaces
Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to ...
متن کاملOperation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm
: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...
متن کاملHierarchical Policy Gradient Algorithms
Hierarchical reinforcement learning is a general framework which attempts to accelerate policy learning in large domains. On the other hand, policy gradient reinforcement learning (PGRL) methods have received recent attention as a means to solve problems with continuous state spaces. However, they suffer from slow convergence. In this paper, we combine these two approaches and propose a family ...
متن کاملImproved State Aggregation with Growing Neural Gas in Multidimensional State Spaces
Q-Learning is a widely used method for dealing with reinforcement learning problems. However, the conditions for its convergence include an exact representation and sufficiently (in theory even infinitely) many visits of each state-action pair—requirements that raise problems for large or continuous state spaces. To speed up learning and to exploit gained experience more efficiently it is highl...
متن کاملContinuous-action reinforcement learning with fast policy search and adaptive basis function selection
As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Adaptive Behaviour
دوره 6 شماره
صفحات -
تاریخ انتشار 1997